Under-Approximating Expected Total Rewards in POMDPs
نویسندگان
چکیده
Abstract We consider the problem: is optimal expected total reward to reach a goal state in partially observable Markov decision process (POMDP) below given threshold? tackle this—generally undecidable—problem by computing under-approximations on these rewards. This done abstracting finite unfoldings of infinite belief MDP POMDP. The key issue find suitable under-approximation value function. provide two techniques: simple (cut-off) technique that uses good policy POMDP, and more advanced (belief clipping) minimal shifts probabilities between beliefs. use mixed-integer linear programming (MILP) such probability experimentally show our techniques scale quite well while providing tight lower bounds reward.
منابع مشابه
Continuous Time Markov Decision Processes with Expected Discounted Total Rewards
Abstract. This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function ...
متن کاملCounterexamples for Expected Rewards
The computation of counterexamples for probabilistic systems has gained a lot of attention during the last few years. All of the proposed methods focus on the situation when the probabilities of certain events are too high. In this paper we investigate how counterexamples for properties concerning expected costs (or, equivalently, expected rewards) of events can be computed. We propose methods ...
متن کاملGenetic Algorithms for Approximating Solutions to POMDPs
We use genetic algorithms (GAs) to nd good nite horizon policies for POMDPs, where the search is limited to policies with a xed nite amount of policy memory. Initial results were presented in (Lusena et al. 1999) with one GA. In this paper, diierent cross-over and mutation rates are compared. Initializing the population of the genetic algorithm is done using smaller genetic algorithms. The sele...
متن کاملPOMDPs under Probabilistic Semantics
We consider partially observable Markov decision processes (POMDPs) with limitaverage payoff, where a reward value in the interval [0, 1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold λ1 ∈ (...
متن کاملPolicy Iteration Algorithms for DEC-POMDPs with Discounted Rewards
Over the past seven years, researchers have been trying to find algorithms for the decentralized control of multiple agent under uncertainty. Unfortunately, most of the standard methods are unable to scale to real-world-size domains. In this paper, we come up with promising new theoretical insights to build scalable algorithms with provable error bounds. In the light of the new theoretical insi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-99527-0_2